Distinguishing the opponents in the prisoner dilemma in well-mixed populations 
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Here we study the effects of adopting different strategies against different opponent instead of 
adopting the same strategy against aU of them in the prisoner dilemma structured in well-mixed 
populations. We consider an evolutionary process in which strategies that provide reproductive 
success are imitated and players replace one of their worst interactions by the new one. We set 
individuals in a well-mixed population so that network reciprocity effect is excluded and we analyze 
both synchronous and asynchronous updates. As a consequence of the replacement rule, we show 
that mutual cooperation is never destroyed and the initial fraction of mutual cooperation is a 
lower bound for the level of cooperation. We show by simulation and mean-field analysis that 
for synchronous update cooperation dominates while for asynchronous update only cooperations 
associated to the initial mutual cooperations are maintained. As a side effect of the replacement rule, 
an "implicit punishment" mechanism comes up in a way that exploitations are always neutralized 
providing evolutionary stability for cooperation. 
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1. Introduction 



Cooperative dilemma was initially studied in the 
framework of classical game theory. Usually individu- 
als have two strategies: cooperation and defection. A 
cooperator provides a benefit to the opponent and pays 
a cost for that. A defector receives the benefits if the 
opponent is a cooperator. This defines a material pay- 
off. If individuals maximize their material payoff, it is 
well know that defection will dominate [l|. Departing 
from these initial ideas, evolutionary game theory has 
emerged and strategy evolution in populations was stud- 
ied. In this approach it is implicit assumed the principle 
of natural selection, where the payoff is equated to fit- 
ness and the fittest strategy survives 0- In this context 
it was shown that the classical theory is recovered in the 
replicator equation, where population is considered to be 
well-mixed, that is, apopulation where everybody inter- 
acts with everybody [y]- 

Cooperation cannot be supported without extra mech- 
anisms [1] . Essentially two actions take place for cooper- 
ation survival: maintenance of mutual cooperation and 
prevention from exploitation Q ■ Cooperators can be bet- 
ter off only if they meet each other so that their profits 
exceed defectors' profits. If the individuals perceive that 
it is important what the opponents are doing in order to 
attend these two essential actions, reciprocal preferences 
can come up 04^1 • Reciprocity means that what an indi- 
vidual do depends on what others do to him/her directly 
or indirectly. Direct reciprocity means that I choose what 
to do against you depending on what you do to me. In- 
direct reciprocity means that my behavior toward you 
also depends on what you do to others. Another subtle 



way of reciprocity is network reciprocity. Individuals are 
set on the vertices of a network and interact only with 
their neighbors. In this context, cooperators form clus- 
ters of mutual cooperation and this mutualism is viewed 
as reciprocity [9i4l4|. But human behavior is not so sim- 
ple: individuals can adopt reciprocal strategies but, moti- 
vated by internal emotion, like a nger ag ainst exploitation 
[l5| . they can punish defectors (l5l - ll7j |. This would not 
be so intriguing, as it is just another way of reciprocal 
motives. But the important feature is that individuals 
usually input costs to defectors at their own expenses. 
This behavior is called altruistic punishment, because in- 
dividuals pay a cost to punish even if they never met the 
punished opponent again and because the punishment 
acts weaken the defectors and the entire population gets 
better off [l5|. Reputation, rewards or repeated inter- 
action, as internal motives, they all interact with pun- 
ishment motives 0, Q . Punishment involves some subtle 
questions and gives rise to another evolutionary puzzle: 
altruistic punishment, although seemingly usual, may be 
a maladaptive trait as the punishers get worst payoffs @ . 

Recently it was introduced the possibility of an "im- 
plicit punishment" without turn to extra individual pref- 
erences except the desire to maximize the own gain. This 
was accomplished by the adoption of different strategies 
against different opponents in the context of network reci- 
procity with synchronous update [11] . Instead of playing 
the same strategy against all of the neighbors, individu- 
als can choose one different strategy against each oppo- 
nent. If each player strategically updates their strategies 
possibly imitating a successful random neighbor and re- 
places the interaction that gives the worst payoff by the 
imitated strategy, it was shown for square lattices that 
cooperation was strongly supported, even for huge defec- 
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tion tendency, and was robust against misjudgments of 
the worst interaction. The possibihty of opponent differ- 
entiation introduces a mechanism of punishing without 
costs and without any kind of internal preferences except 
the desire of maximize own payoff. We cah this punish- 
ment "imphcit punishment" . But in that work [18| , the 
possibihty of adoption of different strategies was intro- 
duced in the context of network reciprocity. What hap- 
pens if network reciprocity is excluded? Here we analyze 
this model in well mixed populations, what means that 
we are excluding network reciprocity effects. The other 
important feature of the model is the synchronous up- 
date assumption. It is well known that results may be 
striking different if asynchronous update is used [l^ H^l ■ 
Here we analyze the model with both synchronous and 
asynchronous updates. We show that cooperation still re- 
mains alive, although for asynchronous update it achieves 
its lower bound level. We analyze the model using com- 
puter simulations and a mean-field technique. 

2. The model 

Let us state the model formally. We study the pris- 
oner dilemma in a population of size N as the scenario 
for the cooperation problem. We consider a well-mixed 
population, what means that each player can interact 
with everybody. The strategy vector of an individual is 
S = {Si, . . . , Sj, . . . , Sn-i), where Sj can be C (coop- 
eration) or D (defection). So individuals are merged in 

— 1 interactions. If in one of these interactions an 
individual plays C against an opponent who is playing 
D, we denote this interaction as (C,D) (the first entry 
is the strategy of the focal player and the second en- 
try is the opponent strategy) . The payoff of a D strat- 
egy against a C strategy is P{D, C) = b, where 6 > 1 
is the defection tendency. Using the same notation, we 
have that P{D,D) = e, with e << 1, P(C, C) = 1 and 
P(C, D) = 0. For synchronous update, each player in- 
teracts with the other — 1 players, plays a round of 
one game against each opponent, and earns a cumulative 
payoff. After that, each player chooses randomly one 
neighbor and compare their cumulative payoffs. If the 
opponent cumulative payoff is bigger than its own one, 
it imitates the strategy the opponent is using against 
him/her with probability proportional to the difference 
of cumulative payoffs, H = |APcum|/((iV - 1)6) [U. On 
the other hand, if the opponent cumulative payoff is lower 
than its own one, the focal player remains with the same 
strategy. If imitation takes place, the new strategy re- 
places the strategy used in the interaction that gives the 
worst payoff. The worst payoff of the focal player is given 
by the interaction (C,D), followed by (D,D), (C,C) and 
(D,C). For asynchronous update, a random individual is 
chosen so that it can imitate and possibly replace one 
of its strategies like in the synchronous case. After this 
individual update, the entire population play a round of 
the game, and each player earns new cumulative payoffs, 
and another random individual is chosen to update its 
strategies. A time step consists of N of such processes. 



3. Evolutionary analysis 

We made our simulations using networks of size N = 
40 and N = 100 and evaluate the fraction of cooper- 
ation (/c) adopted by all of the players in all of their 
interactions. If ric is the quantity of C strategies used 
in all of the interactions by all of the players, we have 
< ric < N{N - 1) and fc = nc/N{N - 1). The ran- 
dom initial configuration consists of 50% of cooperation 
and the averages are made over 100 different initial con- 
ditions. We use e = 0.001 and we show here only the 
case 6 = 2, although we simulated the model also for 
other values of b. In fact, the b value has no effect in the 
simulations and in the mean-field results. 

Before going on, let us state one fundamental feature 
of the model that is independent if the update is syn- 
chronous or not. Suppose a focal player imitates a defec- 
tion strategy. We state that any interaction of type (C,C) 
will never be replaced by (D,C). If some opponent adopts 
defection, the focal player must have at least one (C,D) 
or (D,D) interaction. But these interactions give payoffs 
worst than (C,C). So (C,C) will never be replaced. This 
proves the existence of a lower bound for the fraction of 
cooperation given by the initial fraction of mutual coop- 
eration. On the other hand, if the focal player imitates a 
D strategy, he/she will first seek for (C,D) interactions. 
If they are present, (C,D) will be replaced by (D,D). One 
can see that mutual cooperation is never destroyed and 
every exploitation is punished. 

For the usual game, where each player adopts a sin- 
gle strategy against all of its opponents, a single defector 
can invade a population of cooperators in infinity well- 
mixed population 0. The first remarkable feature of 
"implicit punishment" is that cooperation is evolutionary 
stable in well-mixed population for both synchronous and 
asynchronous update. If a mutant that adopts defection 
against everybody appears in a population where every- 
body is cooperating, the mutant initially earns a huge 
payoff. But as soon as others imitate it, the exploited 
(C,D) interactions will be replaced by (D,D), neutralizing 
the exploitations. What is simple, but remarkable, is that 
the interactions that are changing are just those in which 
the exploiter is involved and all of the other mutual coop- 
erations are maintained. The "implicit punishment" will 
take place until the mutant cumulative payoff is equal to 
cooperator cumulative payoff. If we call n^xp the quan- 
tity of defection adopted by the mutant exploiter, the 
payoff of the mutant exploiter and the payoff of the co- 
operators are Pexp = n^xpib + e) and Pcoop — N — 2, 
respectively. By equating both expression we have the 
equilibrium fraction 

_ N ~2 

'''''' ~ TT7- 

Let us now discuss the results obtained by numerical 
simulations with both synchronous and asynchronous up- 
dates. In the initial conditions, we set the players to co- 
operate with probability 0.5 against each one of its oppo- 
nents. This gives an initial cooperation fraction of 50%. 
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For asynchronous update, cooperation cannot dominate 
population but it can coexist with defection and assumes 
values near the lower-bound (25% for the initial condition 
assumed here). Fig. 1 shows simulation and mean- field 
approximation results for the asynchronous update. For 
the synchronous update, cooperation dominates the en- 
tire population. Fig. 2 shows the simulation and mean- 
field approximation results for the synchronous update. 
One can see in Fig. 3 that, for short times, the syn- 
chronous update behaves like the asynchronous update. 
At the beginning, all of the exploitations are neutralized 
and only the initial mutual cooperation survives. After 
that, cooperation starts to increase very slowly until it 
dominates the population. So we can define a short-time 
regime and a long-time regime for synchronous dynam- 
ics. The same qualitative result holds for large popula- 
tions, but simulation time gets extremely huge for syn- 
chronous update. Fig. 4 shows a finite size analysis for 
the time spent to reach the minimum value of the co- 
operation fraction before cooperation dominates in the 
synchronous update. Note that as along as N increases, 
1/T goes to zero, implying that there is no long-time 
regime for N — oo. On the other hand, if we have N 
large but finite, the long-time regime is the asymptotic 
regime, which is characterized by domination of cooper- 
ation. 
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FIG. 2: Fraction of cooperation for synchronous update with 
iV = 40 and 6 = 2.0. 
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FIG. 3: Short-time regime of the cooperation fraction for syn- 
chronous update with TV = 40 and b = 2.0. 
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FIG. 1: Fraction of cooperation for asynchronous update with 
iV = 100 and b = 2.0. 

mean-field solution provides a good equilibrium anal- 
ysis in well-mixed population if the usual game is con- 
sidered (2^ . But in structured population, it is not a 
good approximation p3| . Although in the present work 
we deal with well-mixed population, the nature of the 
"implicit punishment" model is not so simple. It is not 
obvious that a mean-field approach would work. So it 
is a remarkable result the fact that our mean-field ap- 



proximation gives not only the stationary solutions, but 
fits reasonably the in silico time evolution, although for 
the synchronous update it fits well only in the short-time 
regime. Let us now derive the mean-field solution. We 
first analyze the asynchronous update followed by the 
synchronous one. 
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FIG. 4: The plot shows the relation 1/T x 1/N, where is 
the population size and T is the time to reach the minimum 
value of cooperation for synchronous update. 
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mean-field approximation for the cisynchronous 

update 

Let us first define a local and a global interaction con- 
centrations for a population of size N + 1. If in an in- 
teraction a player i adopts strategy A and its opponent 
adopts B, where A,Bg {C, D}, we say that player i has 
a (A,B) interaction. Player i can have NAsii) (A-,B) in- 
teractions. We define the local concentration of (A,B) as 
the fraction of (A,B) interactions around player i, namely 
XAsii) = NAB{i)IN. For the global concentration of 
(A,B) we define xab = Y^i NAB{i}/{{N + 1)N). 



We first consider a typical player, that we call focal 
player. We are going to study the dynamics of the local 
concentration of (C,D), (D.D), (C,C), and (D,C) interac- 
tions around the focal player. Let fci, /c2, ks, and k4 be 
the quantity of such interactions around the focal player. 
Note that ^1 + ^2 + ^3 + ^4 = N. Let us assume that 
the probability of having fci, k2, k^, and fc4 interactions 
are given by the respective global concentrations. The 
probability of a focal player configuration (^1,^2,^3) is 
given by 



-Pfel,fe2,fe3 



m 



kimk^KN - ki - k2 - ksy: 



Ml J^3 ^N-kl-k2-k3 

CD-'^DD''^CC-'^DC 



r 



We consider the other nodes as mean-field nodes. So the 
the local concentration of (C,D), (D,D), and (C,C) inter- 
actions around those nodes is given by the configuration 
vector N{xcd,xdd,xcc)- Now we are going to derive 
the rate of variation of the local concentration around 
the focal player. 

Let us derive the rate of increasing of the {D, D) lo- 
cal concentration. Suppose that the focal player is on 
a (fci, fc2, fca) configuration. Note that the payoff of this 
configuration of the focal player is 

PF = {N -ki-k2- k3)b + k3 + k2e. 

There is just one transition that increases this quantity: 
(C,D) to (D,D). In this replacement, the focal player 
adopts C and imitates an opponent that adopts a D. So at 
least one interaction of type (C,D) must be present. But 



the focal player can imitate the D strategy from (C,D) 
or (D,D) interactions. Let us focus on the first alterna- 
tive, that happens with probability ki/N. The opponent 
associated with the (C,D) interaction has a payoff of 

PFcD = b+{N- l){xDcb + xcc + xooe). 

The probability of imitating the D strategy from (C,D) 
is given by 

[PFcD - PF]Q{PFcD - PFk) 
bN 

where 9(a;) = 1 if a; > 0, and 6(a;) =0 otherwise. The 
mean rate of increasing of (D,D) by one unit due to the 
imitation from (C,D) is 



J 



N N-kiN-ki-k2 

E E E 

fel = l k2=0 /C3=0 



ki [PFcD - PF]Q{PFcD - PF) 



N 



bN 



Pki,k2,k3 



Following the same lines we obtain the possibility of imitating from (D,D), that is given by 

N N-kiN-ki-k2 



^A.= EE E 

fel = l k2=0 fc3=0 



[PFdd - PF]Q{PFdd - PF) 



N 



bN 



fel,fe2,fc3- 



r 



The same analysis can be done to calculate the rate of 
decreasing of {D,D). There is just one transition in- 
volved, namely, (D,D) to (C,D). At least one (D,D) must 
be present. The C strategy can be imitated from (C,C) or 
(D,C). But note that the focal player cannot be currently 
adopting a (C,D), because in that case (C,D) would give 
the worst payoff. So imitating a C strategy would not 



change the quantity of (D,D). Following the previous 
steps, we can define 



Pk2,k3 = 



N\ 



„fe2 „/S3 „JV-fe2-fc3 



k2\k3\{N -k2-k3)\ ''''^^ 

and we obtain that 
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w- >^ "^v^^ fc3 [PFcc - PF]Q{PFcc - PF) „ ^ a a ^ 

fe2=l /S3=0 



^DC- 2-^ 2^ ^ Pk2,kA^-(>k3,0dk2,N)- 

fe2 = l ^3=0 



As all of the nodes have the same typical behavior, be- 
cause the population is well-mixed, we can approximate 
the global concentrations by the local ones. The above 
expressions determine the rate of (D,D) variation by one 
unit. If we want the time derivative of xjjjj, we need do 
divide the expressions by and multiply by a factor of 
two, because there is the contribution of the opponents 
update . So we have 



dt 



Wcc - Wdc)- 



Following the same reasoning, one can see that xcc does 
not change in time. Finally, as all of the mean-field vari- 
ables are normalized to one, we obtain that 

dxcD _ I . dxpD dxcc s 
dt ~ 2^ dt dt 

We can simplify further these expressions if we replace 
fci, k2 and inside the payoff expressions of the focal 
player by the expected value of such quantities given 
by the configuration probability: Nxcd, Nxdd, and 
Nxcc, respectively. With this extra approximation the 
& function can be easily evaluated in the limit of large 
A'' and we have the following equation: 



dx 



DD 



dt 



2^{xcD + xddI[1 - (1 - xcD^-'] 



Xcc 



[(1 - xcd) 



N-1 



{xcC + XDCr-']} 



This equation can be solved numerically. Fig.l shows 
the numerical solution and the simulation results. Note 

that this approximation furnishes good results when com- 
pared with in silico evolution. For the initial condition 
used here, we have that the terms inside the parentheses 
are 0.75 powered to A' — 1 and 0.5 powered to A' — 1. 
If A^ is large, the terms that are powered to A^ are very 
small and they can be neglected, at least for short times. 
This gives the following simplified equations, 



dXDD o 1 / , ^ \ 
-^=2j;^{xcn + -^xnn), 



and 



dxcc 
dt 



= 0. 



Xcc 



'^cc^ 



where the index refers to the initial conditions. If we 
set e = 0, just for simplicity, the solution reaches the 
fixed point 



XdD — X%D + '^Xqjj, x'qq 







Xcc J 



and x] 



CD 



'^DC 



One; can sc!c that only the initial miitual cooperation can 
be maintained and all of the other interactions are mutual 
defections. Note that all of the exploitations are neutral- 
ized. Note that this approximation gives good results if 
compared to simulation data. 

mean-field approximation for the synchronous 
update 

Let us treat the synchronous case. Now (C,C) can in- 
crease, because it is possible to have a (D,D) to (C,C) 
transition whenever two players make a (D,D) to (C,D) 
transition in their shared (D,D) interaction. This is an 
essential feature of the synchronous model. This kind 
of transition does not take place in asynchronous update 
and that is the reason why cooperation assumes the lower 
bound value in the asynchronous case. We can approxi- 
mate the rate of this transition by 



dxcc 
dt 



1 



Nx 



DD 



-iWcc + W^cf. 



Let us explain the term in the denominator. If the focal 
player makes a (D,D) to (C,D) transition on an specific 
interaction, the mean-field player associated to this spe- 
cific interaction should choose exactly this interaction, 
what happens with probability 1/Nxdd- If we perform 
the same simplifications that was already done for the 
asynchronous case, we have 

dxcc „ 1 fXCC,,. xAT-i / , ^N-1^^2 

= 2-^{ — [(l-a;cD) -{xcc+XDc) ]} , 



The solution of these equation are straightforward. 



, b DD 
XDD = X'jjjj + ^ 

2 ^ 



^[l-exp(-A(l+^)i)], and 



dx 



DD 



dt 



2^{XCD + XDD^ll - (1 - Xcd)""-^] 



XCC 



[(1 - xcd) 



N-1 



{xcc+XDcr-'w 
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Fig. 2 and Fig. 3 show the numerical sohition of these 
equations. One can see from the above equations that 
xcc increases much slower than xu u . For the initial con- 
dition assumed here, xcc time derivative at the begin- 
ning is almost zero, because the values inside the brackets 
are equal to 0.5 powered to A''. But when evolution starts, 
great part of (C,D) interactions are changed to (D,D) and 
xcD is reduced to some value near to zero making xcc 
to increase faster. So we have two regimes: short-time 
regime, when xcc is kept almost constant around its ini- 
tial values, and long-time regime, when xcc starts to 
increase faster. Fig. 2 and Fig. 3 show both coopera- 
tion evolution regimes. For short times, if we discard the 
terms that are powered to A'' — 1, we have the same solu- 
tion as the asynchronous case. This means that cooper- 
ation assumes a value near its lower bound value, given 
by the initial mutual cooperations. But for long-time 
regime, xcd is near zero and xcc cannot be neglected. 
So Xcc starts to increase until it become equal to one. 
So for sufficient long times, the stationary solution is 

x'^c = ^ and x'^j,=x^c=^BB=^- 

Note that for the short-time regime, shown in Fig. 3, 
the mean-field approximation fits well when compared to 
in silico evolution. For the long-time regime, the time 
evolution of the mean-field solution does not fit well, al- 
though it gives the right stationary solution. 

From the above expressions and simulation data, we 
see that the lower value of cooperation is reached very 
fast in the synchronous update. B\it as long as popu- 
lation size gets bigger, this value is reached very slowly 
(see Fig. 4). Besides that, if N is large, xcc increases 
very slowly. By these reasons, for large TV, in the syn- 
chronous update the population seems to be wrapped in 
the lower value of cooperation, although what is actu- 
ally happening is that cooperation is slowly increasing, 
spreading until dominates the entire population. 



Conclusion 

Here we analyzed the model that allows the individ- 
uals to choose different strategics against different op- 
ponents ill wc;ll-mixed populations for both synchronous 
and asynchronous update. First we showed that coop- 
eration is evolutionary stable for both synchronous and 
asynchronous update, what means that a defector mu- 
tant cannot invade a population of cooperators. We also 
showed, for a initial condition of 50% of cooperation, that 
for synchronous update cooperation always dominates 
while for asynchronous update the cooperation fraction 
assumes the lower bound given by the initial mutual co- 
operation. For the synchronous update, population dy- 
namics exhibits a short-time behavior that is similar to 
the asynchronous update, but for suficient long times, co- 
operation spreads for large N but finite. The crucial dif- 
ference between synchronous and asynchronous is that in 
synchronous update it is possible to have a simultaneous 
update that allows a (D,D) to (C,C) transition. This does 
not happen in asynchronous update. In a preview work, 
the same model was analyse in a square lattice with syn- 
chronous update. Here we showed that the synchronous 
update is crucial for cooperation dominance while net- 
work reciprocity effects are not so important. Although 
the asynchronous update does not provide cooperation 
dominace, it allows cooperation to survive at its lower 
bound value. Note that this result in asynchronous up- 
date is still better than those of the usual game. 
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